Epm–rt–2011-01 Levenshtein Edit Distance-based Type Iii Clone Detection Using Metric Trees
نویسندگان
چکیده
This paper presents an original technique for clone detection with metric trees using Levenshtein distance as the metric defined between two code fragments. This approach achieves a faster empirical performance. The resulting clones may be found with varying thresholds allowing type 3 clone detection. Experimental results of metric trees performance as well as clone detection statistics on an open source system are presented and give promising perspectives.
منابع مشابه
Distance Based Indexing for String Proximity Search
In many database applications involving string data, it is common to have near neighbor queries (asking for strings that are similar to a query string) or nearest neighbor queries (asking for strings that are most similar to a query string). The similarity between strings is defined in terms of a distance function determined by the application domain. The most popular string distance measures a...
متن کاملWord Similarity Calculation by Using the Edit Distance Metrics with Consonant Normalization
Edit distance metrics are widely used for many applications such as string comparison and spelling error corrections. Hamming distance is a metric for two equal length strings and Damerau-Levenshtein distance is a well-known metrics for making spelling corrections through string-to-string comparison. Previous distance metrics seems to be appropriate for alphabetic languages like English and Eur...
متن کاملLevenshtein Distances Fail to Identify Language Relationships Accurately
The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from...
متن کاملLexical similarity can distinguish between automatic and manual translations
We consider the problem of identifying automatic translations from manual translations of the same sentence. Using two different similarity metrics (BLEU and Levenshtein edit distance), we found out that automatic translations are closer to each other than they are to manual translations. We also use phylogenetic trees to provide a visual representation of the distances between pairs of individ...
متن کاملA Dialect Distance Metric Based on String and Temporal Alignment
The Levenshtein distance is an established metric to represent phonological distances between dialects. So far, this metric has usually been applied on manually transcribed word lists. In this study we introduce several extensions of the Levenshtein distance by incorporating probabilistic edit costs as well as temporal alignment costs. We tested all variants for compliance with the axioms that ...
متن کامل